Voicing classification of visual speech using convolutional neural networks
نویسندگان
چکیده
The application of neural network and convolutional neural network (CNN) architectures is explored for the tasks of voicing classification (classifying frames as being either non-speech, unvoiced, or voiced) and voice activity detection (VAD) of visual speech. Experiments are conducted for both speaker dependent and speaker independent scenarios. A Gaussian mixture model (GMM) baseline system is developed using standard image-based two-dimensional discrete cosine transform (2D-DCT) visual speech features, achieving speaker dependent accuracies of 79 % and 94 %, for voicing classification and VAD respectively. Additionally, a singlelayer neural network system trained using the same visual features achieves accuracies of 86 % and 97 %. A novel technique using convolutional neural networks for visual speech feature extraction and classification is presented. The voicing classification and VAD results using the system are further improved to 88 % and 98 % respectively. The speaker independent results show the neural network system to outperform both the GMM and CNN systems, achieving accuracies of 63 % for voicing classification, and 79 % for voice activity detection.
منابع مشابه
Hand Gesture Recognition from RGB-D Data using 2D and 3D Convolutional Neural Networks: a comparative study
Despite considerable enhances in recognizing hand gestures from still images, there are still many challenges in the classification of hand gestures in videos. The latter comes with more challenges, including higher computational complexity and arduous task of representing temporal features. Hand movement dynamics, represented by temporal features, have to be extracted by analyzing the total fr...
متن کاملProvide a Deep Convolutional Neural Network Optimized with Morphological Filters to Map Trees in Urban Environments Using Aerial Imagery
Today, we cannot ignore the role of trees in the quality of human life, so that the earth is inconceivable for humans without the presence of trees. In addition to their natural role, urban trees are also very important in terms of visual beauty. Aerial imagery using unmanned platforms with very high spatial resolution is available today. Convolutional neural networks based deep learning method...
متن کاملA New Method to Improve Automated Classification of Heart Sound Signals: Filter Bank Learning in Convolutional Neural Networks
Introduction: Recent studies have acknowledged the potential of convolutional neural networks (CNNs) in distinguishing healthy and morbid samples by using heart sound analyses. Unfortunately the performance of CNNs is highly dependent on the filtering procedure which is applied to signal in their convolutional layer. The present study aimed to address this problem by a...
متن کاملشبکه عصبی پیچشی با پنجرههای قابل تطبیق برای بازشناسی گفتار
Although, speech recognition systems are widely used and their accuracies are continuously increased, there is a considerable performance gap between their accuracies and human recognition ability. This is partially due to high speaker variations in speech signal. Deep neural networks are among the best tools for acoustic modeling. Recently, using hybrid deep neural network and hidden Markov mo...
متن کاملDecision Support System for Age-Related Macular Degeneration Using Convolutional Neural Networks
Introduction: Age-related macular degeneration (AMD) is one of the major causes of visual loss among the elderly. It causes degeneration of cells in the macula. Early diagnosis can be helpful in preventing blindness. Drusen are the initial symptoms of AMD. Since drusen have a wide variety, locating them in screening images is difficult and time-consuming. An automated digital fundus photography...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2015